Overview

Dataset statistics

Number of variables12
Number of observations2935849
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)< 0.1%
Total size in memory235.2 MiB
Average record size in memory84.0 B

Variable types

NUM10
DATE1
CAT1

Warnings

Dataset has 6 (< 0.1%) duplicate rows Duplicates
year is highly correlated with date_block_numHigh correlation
date_block_num is highly correlated with yearHigh correlation
yrday is highly correlated with monthHigh correlation
month is highly correlated with yrdayHigh correlation
item_cnt_day is highly skewed (γ1 = 272.8331617) Skewed
date_block_num has 115690 (3.9%) zeros Zeros
weekday has 337074 (11.5%) zeros Zeros

Reproduction

Analysis started2020-09-24 20:50:10.637737
Analysis finished2020-09-24 20:57:58.824405
Duration7 minutes and 48.19 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

date
Date

Distinct1034
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.4 MiB
Minimum2013-01-01 00:00:00
Maximum2015-10-31 00:00:00
2020-09-25T02:27:59.234359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:27:59.569338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

date_block_num
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct34
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.56991146
Minimum0
Maximum33
Zeros115690
Zeros (%)3.9%
Memory size22.4 MiB
2020-09-25T02:27:59.859637image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q17
median14
Q323
95-th percentile31
Maximum33
Range33
Interquartile range (IQR)16

Descriptive statistics

Standard deviation9.422987709
Coefficient of variation (CV)0.6467429629
Kurtosis-1.082868996
Mean14.56991146
Median Absolute Deviation (MAD)8
Skewness0.2038579466
Sum42775060
Variance88.79269736
MonotocityIncreasing
2020-09-25T02:28:00.120352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%) 
111432464.9%
 
231307864.5%
 
21213474.1%
 
01156903.9%
 
11086133.7%
 
71047723.6%
 
61005483.4%
 
51004033.4%
 
12993493.4%
 
10967363.3%
 
Other values (24)181435961.8%
 
ValueCountFrequency (%) 
01156903.9%
 
11086133.7%
 
21213474.1%
 
3941093.2%
 
4917593.1%
 
ValueCountFrequency (%) 
33535141.8%
 
32505881.7%
 
31570291.9%
 
30555491.9%
 
29546171.9%
 

shop_id
Real number (ℝ≥0)

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.00172829
Minimum0
Maximum59
Zeros9857
Zeros (%)0.3%
Memory size22.4 MiB
2020-09-25T02:28:00.387521image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6
Q122
median31
Q347
95-th percentile57
Maximum59
Range59
Interquartile range (IQR)25

Descriptive statistics

Standard deviation16.22697305
Coefficient of variation (CV)0.4917007044
Kurtosis-1.025358056
Mean33.00172829
Median Absolute Deviation (MAD)13
Skewness-0.07236142921
Sum96888091
Variance263.3146543
MonotocityNot monotonic
2020-09-25T02:28:00.515355image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
312356368.0%
 
251861046.3%
 
541434804.9%
 
281422344.8%
 
571174284.0%
 
421092533.7%
 
271053663.6%
 
6826632.8%
 
58714412.4%
 
56695732.4%
 
Other values (50)167267157.0%
 
ValueCountFrequency (%) 
098570.3%
 
156780.2%
 
2259910.9%
 
3255320.9%
 
4382421.3%
 
ValueCountFrequency (%) 
59421081.4%
 
58714412.4%
 
571174284.0%
 
56695732.4%
 
55347691.2%
 

item_id
Real number (ℝ≥0)

Distinct21807
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10197.22706
Minimum0
Maximum22169
Zeros1
Zeros (%)< 0.1%
Memory size22.4 MiB
2020-09-25T02:28:00.667444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1540
Q14476
median9343
Q315684
95-th percentile20949
Maximum22169
Range22169
Interquartile range (IQR)11208

Descriptive statistics

Standard deviation6324.297354
Coefficient of variation (CV)0.6201977575
Kurtosis-1.225209966
Mean10197.22706
Median Absolute Deviation (MAD)5492
Skewness0.2571735482
Sum2.993751886e+10
Variance39996737.02
MonotocityNot monotonic
2020-09-25T02:28:00.797834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20949313401.1%
 
582294080.3%
 
1771790670.3%
 
280874790.3%
 
418168530.2%
 
785666020.2%
 
373264750.2%
 
230863200.2%
 
487058110.2%
 
373458050.2%
 
Other values (21797)284068996.8%
 
ValueCountFrequency (%) 
01< 0.1%
 
16< 0.1%
 
22< 0.1%
 
32< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
221691< 0.1%
 
221686< 0.1%
 
221671114< 0.1%
 
22166270< 0.1%
 
221652< 0.1%
 

item_price
Real number (ℝ)

Distinct19993
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean890.8532327
Minimum-1
Maximum307980
Zeros0
Zeros (%)0.0%
Memory size22.4 MiB
2020-09-25T02:28:00.948211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile99
Q1249
median399
Q3999
95-th percentile2690
Maximum307980
Range307981
Interquartile range (IQR)750

Descriptive statistics

Standard deviation1729.799631
Coefficient of variation (CV)1.941733573
Kurtosis445.5328258
Mean890.8532327
Median Absolute Deviation (MAD)250
Skewness10.7504227
Sum2615410572
Variance2992206.762
MonotocityNot monotonic
2020-09-25T02:28:01.076713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2992913529.9%
 
3992426038.3%
 
1492184327.4%
 
1991840446.3%
 
3491014613.5%
 
599956733.3%
 
999827842.8%
 
799778822.7%
 
249776852.6%
 
699764932.6%
 
Other values (19983)148744050.7%
 
ValueCountFrequency (%) 
-11< 0.1%
 
0.072< 0.1%
 
0.08751< 0.1%
 
0.091< 0.1%
 
0.129320.1%
 
ValueCountFrequency (%) 
3079801< 0.1%
 
592001< 0.1%
 
509991< 0.1%
 
497821< 0.1%
 
429904< 0.1%
 

item_cnt_day
Real number (ℝ)

SKEWED

Distinct198
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.242640885
Minimum-22
Maximum2169
Zeros0
Zeros (%)0.0%
Memory size22.4 MiB
2020-09-25T02:28:01.219073image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-22
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum2169
Range2191
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.618834431
Coefficient of variation (CV)2.107474864
Kurtosis177478.0988
Mean1.242640885
Median Absolute Deviation (MAD)0
Skewness272.8331617
Sum3648206
Variance6.858293776
MonotocityNot monotonic
2020-09-25T02:28:01.350764image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1262937289.6%
 
21942016.6%
 
3473501.6%
 
4196850.7%
 
5104740.4%
 
-172520.2%
 
663380.2%
 
740570.1%
 
829030.1%
 
921770.1%
 
Other values (188)120400.4%
 
ValueCountFrequency (%) 
-221< 0.1%
 
-161< 0.1%
 
-91< 0.1%
 
-62< 0.1%
 
-54< 0.1%
 
ValueCountFrequency (%) 
21691< 0.1%
 
10001< 0.1%
 
6691< 0.1%
 
6371< 0.1%
 
6241< 0.1%
 

day
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.85266715
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size11.2 MiB
2020-09-25T02:28:01.477178image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q324
95-th percentile30
Maximum31
Range30
Interquartile range (IQR)16

Descriptive statistics

Standard deviation8.923482976
Coefficient of variation (CV)0.5629010495
Kurtosis-1.222018961
Mean15.85266715
Median Absolute Deviation (MAD)8
Skewness-0.005873321383
Sum46541037
Variance79.62854842
MonotocityNot monotonic
2020-09-25T02:28:01.594096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
21033723.5%
 
71022733.5%
 
221013453.5%
 
231013393.5%
 
81009863.4%
 
211002083.4%
 
28998133.4%
 
3990273.4%
 
27989523.4%
 
6980583.3%
 
Other values (21)193047665.8%
 
ValueCountFrequency (%) 
1944213.2%
 
21033723.5%
 
3990273.4%
 
4944693.2%
 
5954363.3%
 
ValueCountFrequency (%) 
31676012.3%
 
30974363.3%
 
29908993.1%
 
28998133.4%
 
27989523.4%
 

month
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.247716759
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size11.2 MiB
2020-09-25T02:28:01.721825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.536219343
Coefficient of variation (CV)0.5660018658
Kurtosis-1.236332881
Mean6.247716759
Median Absolute Deviation (MAD)3
Skewness0.09620076191
Sum18342353
Variance12.50484724
MonotocityNot monotonic
2020-09-25T02:28:01.921653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
130356110.3%
 
32840579.7%
 
122740329.3%
 
22702519.2%
 
82484158.5%
 
62374288.1%
 
72348578.0%
 
42282897.8%
 
102270777.7%
 
52248367.7%
 
Other values (2)40304613.7%
 
ValueCountFrequency (%) 
130356110.3%
 
22702519.2%
 
32840579.7%
 
42282897.8%
 
52248367.7%
 
ValueCountFrequency (%) 
122740329.3%
 
111831646.2%
 
102270777.7%
 
92198827.5%
 
82484158.5%
 

year
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.2 MiB
2013
1267562 
2014
1055861 
2015
612426 
ValueCountFrequency (%) 
2013126756243.2%
 
2014105586136.0%
 
201561242620.9%
 
2020-09-25T02:28:02.161059image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-25T02:28:03.490720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:28:03.570923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length4
Mean length4
Min length4

weekday
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.365686382
Minimum0
Maximum6
Zeros337074
Zeros (%)11.5%
Memory size22.4 MiB
2020-09-25T02:28:03.662678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q35
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.996795044
Coefficient of variation (CV)0.5932801863
Kurtosis-1.203917591
Mean3.365686382
Median Absolute Deviation (MAD)2
Skewness-0.2763606994
Sum9881147
Variance3.987190447
MonotocityNot monotonic
2020-09-25T02:28:03.745991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
559035920.1%
 
650310417.1%
 
443929815.0%
 
336728012.5%
 
235296212.0%
 
134577211.8%
 
033707411.5%
 
ValueCountFrequency (%) 
033707411.5%
 
134577211.8%
 
235296212.0%
 
336728012.5%
 
443929815.0%
 
ValueCountFrequency (%) 
650310417.1%
 
559035920.1%
 
443929815.0%
 
336728012.5%
 
235296212.0%
 

yrday
Real number (ℝ≥0)

HIGH CORRELATION

Distinct365
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean173.6892047
Minimum0
Maximum364
Zeros5194
Zeros (%)0.2%
Memory size22.4 MiB
2020-09-25T02:28:03.866652image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12
Q175
median170
Q3266
95-th percentile353
Maximum364
Range364
Interquartile range (IQR)191

Descriptive statistics

Standard deviation108.698348
Coefficient of variation (CV)0.6258209781
Kurtosis-1.20685451
Mean173.6892047
Median Absolute Deviation (MAD)95
Skewness0.1107832354
Sum509925278
Variance11815.33086
MonotocityNot monotonic
2020-09-25T02:28:03.994828image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1186520.6%
 
363184620.6%
 
52177230.6%
 
2171200.6%
 
53168180.6%
 
361168040.6%
 
364161120.5%
 
362158400.5%
 
3155880.5%
 
4150570.5%
 
Other values (355)276767394.3%
 
ValueCountFrequency (%) 
051940.2%
 
1186520.6%
 
2171200.6%
 
3155880.5%
 
4150570.5%
 
ValueCountFrequency (%) 
364161120.5%
 
363184620.6%
 
362158400.5%
 
361168040.6%
 
360142520.5%
 

item_category_id
Real number (ℝ≥0)

Distinct84
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.0013829
Minimum0
Maximum83
Zeros3
Zeros (%)< 0.1%
Memory size22.4 MiB
2020-09-25T02:28:04.137891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19
Q128
median40
Q355
95-th percentile71
Maximum83
Range83
Interquartile range (IQR)27

Descriptive statistics

Standard deviation17.10075855
Coefficient of variation (CV)0.4275041838
Kurtosis-0.5251578565
Mean40.0013829
Median Absolute Deviation (MAD)15
Skewness0.3182825248
Sum117438020
Variance292.435943
MonotocityNot monotonic
2020-09-25T02:28:04.266553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4056465219.2%
 
3035159112.0%
 
5533958511.6%
 
192082197.1%
 
371926746.6%
 
231467895.0%
 
281215394.1%
 
20790582.7%
 
63538451.8%
 
65532271.8%
 
Other values (74)82467028.1%
 
ValueCountFrequency (%) 
03< 0.1%
 
12< 0.1%
 
2184610.6%
 
3252830.9%
 
423040.1%
 
ValueCountFrequency (%) 
8372060.2%
 
8243900.1%
 
81795< 0.1%
 
801325< 0.1%
 
7990670.3%
 

Interactions

2020-09-25T02:24:03.343269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:04.672345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:06.016480image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:08.285583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:09.748165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:11.012357image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:12.570610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:14.202912image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:15.438574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:16.950564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:18.693343image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:20.334275image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:21.891115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:23.587132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:25.756188image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:27.032983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:28.946800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:31.529071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:32.924020image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:34.940997image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:36.126236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:38.080212image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:39.300649image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:41.563936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:43.136463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:44.415546image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:45.965684image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:47.663152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:49.394453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:50.952293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:52.109156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:53.670218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:55.283546image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:56.483363image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:24:57.734037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:00.143766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:01.548195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:03.489525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:04.843939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:06.823805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:08.009959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:09.953224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:11.168718image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:12.369409image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:14.353316image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:16.275255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:17.493125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:19.391314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:20.938831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:22.640517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:24.227697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:25.885589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:27.110414image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:28.353122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:29.668128image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:31.638293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:33.252226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:34.914610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:37.564341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:39.144089image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:40.851441image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:42.384259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:44.701314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:45.935909image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:47.090909image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:48.979093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:50.317215image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:52.244751image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:53.594514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:56.227607image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:25:58.036157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:00.182069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:01.451787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:02.692501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:04.704081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:06.008553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:07.953446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:09.365064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:11.313543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:13.075299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:14.585440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:16.389567image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:18.607331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:19.807579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:21.013116image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:23.065147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:24.319144image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:26.322023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:27.804658image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:30.156593image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:32.306384image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:34.102702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:35.324232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:36.547923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:38.490801image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:40.019011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:41.799890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:44.540952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:46.276397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:26:48.743379image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-09-25T02:28:04.428535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-25T02:28:04.620582image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-25T02:28:05.048311image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-25T02:28:06.343422image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-09-25T02:27:02.351277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-09-25T02:27:09.334149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

datedate_block_numshop_iditem_iditem_priceitem_cnt_daydaymonthyearweekdayyrdayitem_category_id
02013-01-0205922154999.001.02120132137
12013-01-030252552899.001.03120133258
22013-01-050252552899.00-1.05120135458
32013-01-0602525541709.051.06120136558
42013-01-1502525551099.001.0151201311456
52013-01-100252564349.001.010120133959
62013-01-020252565549.001.02120132156
72013-01-040252572239.001.04120134355
82013-01-110252572299.001.0111201341055
92013-01-030252573299.003.03120133255

Last rows

datedate_block_numshop_iditem_iditem_priceitem_cnt_daydaymonthyearweekdayyrdayitem_category_id
29358392015-10-2433257315399.01.024102015529655
29358402015-10-3133257409299.01.031102015530355
29358412015-10-1133257393349.01.011102015628355
29358422015-10-1033257384749.01.010102015528255
29358432015-10-0933257409299.01.09102015428155
29358442015-10-1033257409299.01.010102015528255
29358452015-10-0933257460299.01.09102015428155
29358462015-10-1433257459349.01.014102015228655
29358472015-10-2233257440299.01.022102015329457
29358482015-10-0333257460299.01.03102015527555

Duplicate rows

Most frequent

datedate_block_numshop_iditem_iditem_priceitem_cnt_daydaymonthyearweekdayyrdayitem_category_idcount
02013-01-0505420130149.01.051201354402
12014-02-2313503423999.01.02322014653232
22014-03-2314213423999.01.02332014681232
32014-05-0116503423999.01.01520143120232
42014-07-1218253423999.01.012720145192232
52014-12-31234221619499.01.0311220142364372